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Abstract. We present a novel method to generate a synthetic distribution of objects (mock) 
on a spherical surface (i.e. a sky), by using a real distribution. The resulting surrogate map 
mimics the clustering features of the real data, including the effects of non-uniform exposure, 
if any. The method is model-independent, also preserving the angular correlation function, 
as well as the angular power spectrum, of the original data. It can be reliably adopted to 
mimic the angular clustering of objects distributed on a spherical surface and it can be easily 
extended to include further information, as the spatial clustering of objects distributed inside 
a sphere. Applications to real data are presented and discussed. In particular, we consider the 
distribution of galaxies recently presented in the 2MASS Redshift Survey (2MRS) catalog. 
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1 Introduction 

Redshift surveys reveal that the distribution of matter in the Universe appears to follow 
patterns typical of strongly correlated stochastic processes, from smallest to largest scales 
[1-3] . Numerical simulations of nonlinear gravitational clustering in Universes dominated by 
weakly interacting cold dark matter have been shown to nicely reproduce the salient features 
of the observed clustering, as well as its formation and evolution [4]. Recently, simulations 
of the growth of dark matter structure, from high redshifts to the present time, have shown 
a considerable agreement with current models for the formation of structure in the Universe 
[5]. Such simulations are able to provide very realistic distributions of matter in the Universe. 

During the last decades the search for statistically significant clustering of such a matter 
has played a crucial role for understanding the cosmological structure formation. A standard 
approach involves the estimation of the statistical autocorrelation function of the matter 
density field. For such a purpose, many estimators have been proposed and widely used to 
capture clustering structures by measuring the deviation from isotropy of angular or radial 
distributions [2, 6, 7]. 

Under the assumption that matter is randomly distributed in a space V according to a 
Poisson process, the number of objects at a distance r is given by 



where n is the expected number density of objects in V, dVi and dV2 are two volume ele- 
ments. However, if the underlying distribution is not Poissonian, the density field is subjected 
to fluctuations inducing clustering of objects. Such an inhomogeneity can be captured by 
inspecting the two-point autocorrelation function (,i2i'r') (2ptACF), defined by 



If dNi2 is normalized to the whole volume of the space V, the above definition suggests that 
the 2ptACF provides a measure of the probability to find an excess of objects at the scale r. 

Within the present work, we show how to generate random realizations (i.e. surrogates) 
of a true distribution of objects, which preserve the angular clustering features of the original 
data. Our method is rather general: it can be extended to higher dimensions and it is 
suitable for many different applications. In particular, we consider the case of astrophysical 
and cosmological studies, where it can be used to build catalogs of objects which mimic 
the clustering features of real matter in the Universe. For instance, such surrogate catalogs 
can be used to train new or existing methods for clustering detection [1, 8-11], and they 
are suitable to simulate realistic distributions of sources of ultra-high energy cosmic rays, in 
anisotropy or source density studies [12-15]. 

The paper is organized as follows: in Sec. 2 we briefly describe the standard correlation 
function adopted to investigate the clustering of astrophysical objects and we show its rela- 
tionship with the corresponding power spectrum. In Sec. 3 we present our method to produce 
surrogate catalogs, based on phase randomization in the spherical harmonics space. We will 
show that the angular power spectrum of the surrogates matches that of the original catalog, 
and, by consequence, that the corresponding correlation function is preserved. We consider 
the application to objects synthetically distributed on the sphere and to real-world objects, 
namely the galaxies in the 2MASS Redshift Survey catalog (2MRS) [16]. 



dNi2 = n'^dVidV2 



(1.1) 
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2 Clustering identification 



2.1 The correlation function 

If p(r) denotes the density field as a function of the position r, and 

p(r) - (p(r)) 
= (P(r)) 

indicates the density field perturbation, the standard definition of the statistical 2ptACF is 
given by 

e(r) = (5(x)5(x + r)), (2.2) 

(•) denoting an averaging over the normalization volume V [17]. More general measures, 
assuming an underlying fractal structure of the matter distribution, make use of modified 
correlation functions based on an average density depending on the position and on the size 
of the sample volume considered, in the framework of fractal analysis [18-21]. 

Because of the lack of large redshift samples, the first estimations of ,^(r) has been given 
by the two-point angular correlation function u}{0), providing a weighted projection of ^(r) 
on a two-dimensional sky through the Limber equation [1] 

to{6) = J dyyU^ J dx^(^Vx^ + y^6^') , (2.3) 

where (p is the radial selection function for the considered sky survey. 

Hence, the problem of estimating ^(r) in a three-dimensional space has been reduced to 
the problem of estimating io{0) on a two-dimensional sky. If n is the number of objects on 
the projected spherical surface 5 of V and m is the number of objects coming from several 
Monte Carlo realizations of S, the angular autocorrelation function can be simply estimated 
by looking for excesses of objects, at a given angular scale, with respect to the isotropic 
expectation. Many estimators have been introduced for such a purpose, whose differences are 
mainly related to the biased estimation they provide. For sake of completeness, we mention 
here the most known: Peebles [1], Davis-Peebles, Landy-Szalay and Hamilton [2, 6, 7] angular 
correlation functions, respectively defined as 

m(m-l) DD(e) , , 

2m DDiQ) , , 

m{m-l)DD{e) m-l DR{9) , ^ 
'^"^(^^ = n(n - 1) RR{e) " ^TRim + ^' ^^'^^ 

4nm DDje) x RR{9) 
'^hW= ^n-l){m-l) Dim ' 

where DD is the number of pairs lying between 9 and 9+A9 for the experimental distribution, 
RR is the same number calculated for Monte Carlo realizations and DR is the cross-pair 
counts between experimental and simulated events. By definition (see Ref. [1]) the functions 
(2.4)-(2.7) are strictly related to the excess on the area dfi of the pairs number dA^pairs over 
the random background dNuc- 

diVpairs - dNuc = ^nnouj{9)dn, (2.8) 
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where uq is the expected average density of events on S assuming an isotropic distribution. 

The large number of objects in the sky surveys, from the earhest to the latest ones, have 
ensured a good estimation of u}{9), further reducing the error on the more recent estimates. 
On small angular scales a power-law behaviour 

-W=(^)'" (2.9) 

has been observed, where 7 ~ 1.8 [1, 22] is the index of the spatial autocorrelation counterpart 




(2.10) 



where r has been replaced by r, namely the distance between two objects in the space, 
and vq ~ 5h~^ Mpc, generally indicated as the correlation length, is the distance such that 
the clustering strength does not exceed the homogeneous expectation. The exponent 7 is 
somehow universal: it is the same for galaxies, cluster of galaxies and superclusters in the 
Universe. However, the correlation function shows a sudden break around r > 10h~^ Mpc, 
above which ^(r) rapidly approaches zero. Moreover, it is worth remarking that a direct 
estimation of ^{r) is difficult because of the effects induced by galaxy peculiar velocities, 
directly related to the evolution of the density field through conservation of mass [17, 23- 
25]. Such effects induce ^(r) to deviate from a power-law, mainly because separations are 
estimated in the redshift space, instead of the real three-dimensional space [22]. The direct 
consequence is a flatter correlation function than ^(r) in the real-space, although, fortunately, 
reliable corrections exist [26-28]. 



2.2 The power spectrum 

Let us suppose that the field d{r), defined by Eq. (2.1), is periodic within a box of size L 
and let us consider the correlation function of Eq. (2.2). By expanding the density field 
perturbation 5{r) in its Fourier modes, we obtain 

where, because of the periodic boundary conditions fcj = n27r/L (i = x, y, z), the cross terms 
with k -|- k' 7^ average to zero. Indeed, by taking into account that ^(r) is a real function, 
it follows 5_k = and 

Hence, by allowing the box size to become arbitrarely large, we can rewrite the inner sum 
as an integral in the k— space and we can express ^(r) by volume averaging the resulting 
expression as 

^(r) = J^I I'JkPe-'^-dk, (2.11) 



-4- 



where P(k) = (|(5kP) is the power spectrum of 5(x), quantifying the scale-dependence of 
density perturbations in the Fourier space [23]. Eq. (2.11) is commonly known as the Wiener- 
Khinchin-Kolmogorov theorem, stating that the autocorrelation function is the Fourier trans- 
form of the power spectrum, or, equivalently, that ^(r) and 'P(k) are Fourier pairs. 

By volume averaging, we have implicitly assumed the ergodicity of the stochastic process 
underlying the field d{r): in practice, this requirement is only supposed to be satisfied if a 
sufficiently large volume is considered [23]. 

3 Surrogate distributions 

In this section we present a method to generate a surrogate catalog of objects mimicking the 
clustering features of real catalogs of objects. Because of the Wiener-Khinchin-Kolmogorov 
theorem, two different skies of objects with the same angular (spatial) power spectrum have 
the same angular (spatial) correlation function. Therefore, any distribution of objects pre- 
serving the power spectrum Pdata will preserve the clustering features of the real distribution. 
In the following, we will present a method which preserves such an angular power spectrum 
by means of a phase randomization approach in the Fourier space. 

In the particular case of the angular distribution on the sphere, the formalism of spher- 
ical harmonics can be adopted. Although our approach is equivalent in the case of a multi- 
dimensional distribution, in the following we will focus on the two-dimensional case. 

Let {6, 4>) indicate a direction on the unit sphere. Moreover, at this step we assume a 
uniform exposure, whereas the case of a non-uniform exposure will be considered further in 
the text. Any square-integrable function f{0,(j)) representing, for instance, an intensity map, 
can be expanded as a linear combination: 

oo I 

f{e,(t>) = ^ ^ aemYeUOA) (3.1) 
i=0 m=-e 

where aem are the multipole coefficients and the functions 



YUeA) = J^^^7^-4T^^-(cose)e-<^ (3.2) 
Y 47r {t + m)\ 

represent the spherical harmonics, Pim indicates the Legendre polynomials and i indicates the 
multipole moment. It is straightforward to show that spherical harmonics forms a complete 
set of orthonormal functions such that 

/ / Yim{d,(t>)Ye>rn'{0,(l))sin9d(t>d9 = 6u>6mra', (3.3) 

Jo Jo 

being 6ij the Kronecker delta. Finally, the observed angular power spectrum for the intensity 
map is defined by 

1 ^ 

m=—£ 

as a function of the multipole moment i. The investigation of Ci provides the same amount of 
information about clustering and anisotropy obtained from the investigation of 7'(k), defined 
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Figure 1. Sky map, realized with. HEALPix, showing a representative random realization of an 
isotropic distribution of objects in the sky (left panel) and a surrogate sky obtained from such a 
distribution (right panel). Color codes the fraction of objects falling in a pixel, while a Gaussian 
beam with 3° dispersion is used. 



at the end of the previous section. The angular correlation function can be obtained from 
the Fourier transform of Ci on the unit sphere. 

Our method is based on the fact that any change of the phases in the coefficients a^mi 
does not affect neither the magnitudes nor the angular power spectrum Ci. Quantitatively, 
is invariant under the transformation airn — ^ o^mC*'^^'", where ^pirn is a real number. The 
dependence on i and m is stressed because, in principle, a different phase can be associated 
to each multipole coefficient. 

The method of phase randomization requires that ^pim is sampled from a stochastic 
variable, uniformly distributed in the interval between and 27r. Hence, the new multipole 
coefficients define a new intensity map /'(0,(/)), while the corresponding sky provides a 
new random realization of the data, which preserves the clustering features. We choose to 
use HEALPix [29] to perform the discrete spherical harmonic transform of the skies. In the 
following, we indicate with S the spherical surface where the whole sky is defined. In general, 
it may happen that only a fraction Sq of S can be physically observed or it can be of interest 
for some reason. For a given pixelization of the sphere, we introduce a weight function, along 
any direction [9, (j)) on the unit sphere, defined by 



uj{6, (j)) oc 



n-^ if (0,0) €5o 

otherwise, 



which assigns a weight proportional to to the pixel corresponding to the direction (6, cj)) 
of an object, being VLp the solid angle covered by that pixel. Hence, the coefficients in the 
expansion given by Eq. (3.1) include the information provided by the weighting procedure. 
In the following, we distinguish two cases: i) sky with uniform coverage, i.e. Sq = S, and ii) 
sky with non-uniform coverage, i.e. Sq C S. 

3.1 Uniform sky 

In order to illustrate our procedure, we consider two examples over the whole sky: i) an 
isotropic distribution of objects in the sky (ISO toy model); ii) a distribution of objects 
where the 50% is clustered around 10 hot spots randomly distributed in the sky, while the 
remaining objects are uniformly distributed (SRC toy model). 

In Fig. 1 we show the density map of the ISO toy model (left panel) and a realization of 
the corresponding surrogate map (right panel). In Fig. 2 we show the same maps in the case 
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of the SRC toy model. Even from a simple by eye-inspection, it is evident that our procedure 
preserves the underlying structures of the original distribution of objects. 

Hence, we have estimated the angular power spectrum for each density map, in order 
to verify if it is preserved as well. In Fig. 3 we show the angular power spectra corresponding 
to the density sky maps of our toy models. In any case our procedure correctly preserves the 
spectrum. In the case of the isotropic sky maps (see Fig. 1), the power spectrum is nearly 
flat, as expected for a typical "white noise" process, where objects are uniformly distributed 
on the sphere. In the case of the sky map of sources and background, the angular power 
spectrum has not a trivial shape. In any case, it is worth remarking that such results can be 
obtained for any shape of the power spectrum, and, by consequence, for any distribution of 
objects on the sphere, under the hypothesis of a uniform exposure. 
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Figure 3. Left panel: angular power spectra of intensity maps shown in Fig. 1 and Fig. 2. Spectra 
for ISO (Fig. 1, left panel), ISO-SURR (Fig. 1, right panel), SRC (Fig. 2, left panel) and SRG-SURR 
(Fig. 2, right panel) are shown. Right panel: Angular power spectra for 2MRS catalog of galaxies 
(2MRS), a phase-randomized surrogate (2MRS-PR) and a surrogate accounting for the mask (2MRS- 
SURR), corresponding respectively to upper, middle and lower panels of Fig. 4. 



3.2 Sky including a mask 

At variance with the previous case, it may happen that the data are provided by observations 
of a sky with non-uniform coverage. In this case, a simple renormalisation can be applied to 
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Figure 4. Same as in Fig. 1, in the case of the 2MRS catalog of galaxies (upper panel). The simple 
phase-randomized surrogate of the catalog is shown in the middle panel: the information about the 
mask is lost. After taking into account such an information in the phase-randomization procedure, 
the correct surrogate map is obtained (lower panel). 

recover the case of the uniform coverage. Nevertheless, if certain regions of the sky have not 
been covered by a survey or simply they were not visible for a given reason, e.g. obstructed by 
a closer object, this procedure does not apply anymore. Therefore, the phase randomization 
method to be still applied requires some modifications. As an example, we consider the 
recently published 2MRS catalog of galaxies [16], where the region around the galactic plane 
is totally masked. The catalog is 97.6% complete to a limiting magnitude K = 11.75 over 
91% of the sky, and contains almost 45,000 galaxies. The catalog provides a fair sample 
of the nearby Universe when the region corresponding to the Galactic band with \b\ < 5° 
(|6| < 8° around the Galactic center) is excluded. Such a band defines a mask. 

It is straightforward to show that the phase randomization preserves the power spectrum 
of the density map even in the case of an incomplete sky. However, it destroys the information 
about the mask, providing a surrogate map defined on the whole sky. Therefore, an additional 
step should be considered in such cases. 

Let a'l^ indicate the phase-randomized multipole coefficients obtained from the expan- 
sion of the original map f{6, (p), and let f'{9, (p) indicate the resulting surrogate map. Hence, 
we assign a null weight to all pixels of the sphere covering the masked region and a weight 
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proportional to to the remaining pixels, being 0,p the solid angle covered by a single 
pixel. 

Finally, we estimate the new coefficients a"^ which take into account the weights as- 
signed to pixels on the sphere, and we use again the spherical harmonics expansion defined 
by Eq. (3.1) to obtain the new surrogate map f"{0, 4>), masked as the original one. 

In Fig. 4 we show the density map of the 2MRS catalog of galaxies (upper panel). Hence, 
the simple application of the phase-randomization procedure, described at the beginning of 
the present section, produces a surrogate map over the whole sky (middle panel), where 
the information about the mask is lost. However, by using the extended version of our 
procedure, the masking effect is correctly taken into account, producing a surrogate map 
correctly masked (lower panel). 

The angular power spectra corresponding to such three cases are shown in the right 
panel of Fig. 3: it is evident that both our procedures correctly preserve the spectra of the 
original skies. 

4 Discussion and conclusions 

We have presented a new method to generate random distributions of objects on the sphere 
with the same angular clustering of real distributions provided as input. Our method is based 
on the phase randomization of multipole coefficients corresponding to the spherical harmonics 
expansion of a map defined on the sphere. It preserves the angular power spectrum and, by 
consequence, the correlation function, because of the Wiener-Khinchin-Kolmogorov theorem. 
However, the method is model-independent and it does not require the direct estimation of 
the correlation function and the power spectrum of the original data, avoiding the intrinsic 
uncertainty on such estimates. Moreover, no fitting procedures are involved. We have shown 
that the method works on very different sky maps, and we have applied it to the distribution 
of galaxies in the 2MASS Redshift Survey as an application to real- world data. 

The extension to the case of the spatial clustering is straightforward but requires much 
more computational efforts, and it did not represent the main goal of the present study. 
However, such a task can be achieved in a reasonable amount of computational time by 
means of the most advanced techniques for the numerical estimation of fast Fourier transforms 
(FFT) in the multi-dimensional space, based, for instance, on the pencil decomposition. 

Our method can be used in several applications. For instance, it can be used to generate 
sky maps of objects with a given angular power spectrum: the resulting maps can be used 
for several purposes, as to train new or existing methods for clustering detection of different 
types [1, 8-11]. 

Moreover, our method could be useful for estimating the overall expected errors in 
peculiar velocity field, where mock catalogs with the same features of a given real catalog 
are required, as shown in recent studies [30, 31]. 

Finally, the method is suitable to simulate realistic distributions of sources of ultra- 
high energy cosmic rays (UHECRs). In fact, the lack of information about the sources of 
UHECRs has motivated several investigations requiring the simulation of particles reflecting 
special type of clustering [12-15], which can be easily provided by our procedure. 
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